Description

Background & Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged on every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers’ and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help bank improve their services so that customers do not renounce their credit cards

Objective

Explore and visualize the dataset. Build a classification model to predict if the customer is going to churn or not Optimize the model using appropriate techniques Generate a set of insights and recommendations that will help the bank Data Dictionary:

Importing Libraries

Getting 5 point summary of the data.

Data Preprocessing

EDA

Bi Variate Analysis

Data Preparation

setting Unknown education level to null. We will then use KNN imputer to find the best value to impute

Splitting data into train and test

Missing value treatment

Encoding Categorical Variables

Model Building

Metric of Choice

Logistic Regression

Using KFold and Cross validation

Model's recall score is in the range of 0.30 to 0.39

UnderSampling train data

Logistic Regression on Undersampled data

Recall does improve on train set and ranges between 0.73 to 0.741

Models performance seemed to have improved. We see Recall on Test set is 0.70. Also, there doesn't seem to be any overfit as the difference between test and train performance is not far off.

Oversampling using SMOTE

Logistic Regression on Oversampled Data

Regularization

Performing Hyperparamet tuning using GridSearch

Tuning XGBOOST

Randomized Search CV XGBoost

Grid Search on Bagging Classifier

Random Search using Bagging

Grid Search on Decision Tree

Randomized Search for Decision Tree

Business Insights